Bitwise operators and more

Contributed by myndzi

Computers and numbers

You probably have heard the term 'binary,' and I bet you all know what it means too, so I won't get -too- detailed here. Suffice it to say that computers do math in binary, so I'll gloss over some things you will need to know.

$asc and $chr

Back in the DOS days.. there were text files. (Wow!) There needed to be a standard way to represent the characters you would want to use within the text files. The American National Standards Institute (ANSI) came up with that standard, dubbed ASCII.

The ASCII character set consists of 96 printable characters, and 32 non-printable characters. (They do have 'pictures', but in text files they mean 'special' things, such as end of file, or newline.) This totals 128 characters, which covers every possible combination of seven binary bits (1s or 0s). A byte consists of 8 bits, which equals 256 possible combinations, however the 8th bit was used for error control (I won't go into parity in this post, brief as it may be).

The ASCII character set was later extended into 256 possible characters, using the 8th bit in the byte. The second set of 128 characters consist mostly of lines, boxes, and special characters such as accented e's, tilded n's, various currency symbols, bullets, and greek letters.

The numbers 0 through 9 also are assigned ascii values as follows:

0 = 48 ... 5 = 53
1 = 49 ... 6 = 54
2 = 50 ... 7 = 55
3 = 51 ... 8 = 56
4 = 52 ... 9 = 57

There is a 'rhyme and reason' to the way letters, symbols, and numbers were assigned their ASCII values, but I will talk about that when I give some examples of pratical use of the bitwise operators.

Almost all of the first 96 printable characters in the ASCII character set can be typed via the keyboard, but you'll need to do something different to type some of the extended characters. On your keyboard, in an editor such as notepad, hold down the ALT key, and (on your numberpad) press 1, 6, 4, then release ALT. You should end up with an � (pronounced 'enyay' in spanish). Now, repeat the process but this time press 0241 (the 0 is required). You should get the same thing.

This is because when Microsoft created Windows (tm) they decided to rearrange all the character assignments. Go figure. (Actually I think it may have something to do with newer versions of Windows and UNICODE, but I don't know, if you know why they did this please enlighten me :)

Now, for those of you who don't know about Character Map, check your 'Start:Program Files:Accessories:System Tools' menu for it. If you don't have it, have someone send it to you, or install it from the Windows Setup tab of the 'Add/Remove Programs' Control Panel.

Character Map is a neat little utility that allows you to look up character codes for various special characters. Just keep in mind that the numbers it gives you are Microsoft's code for the characters, which will be different from whatever you would use in DOS.

mIRC's $asc() identifier will return the ASCII code for a single character. The number returned will be the "dos code" for most (or all, I haven't checked) of the typable characters, but for any of the extended ASCII characters, it will return the 'Windows' version. Notable here is the "non-white space" character, ALT+0160. Keep in mind that mIRC does not return the number padded with a 0 where applicable, so $asc( ) will return '160', but if you want to type it in on the number pad, you will still need to use ALT+0160.

The $chr() identifier does the reverse of $asc(). It converts a numeric value into the equivalent ASCII character. It truely reverses $asc(), so $chr(160) will work as well as $chr(0160)

Before I move on, it is worth noting that there -are- a few fonts which display characters using the original, "DOS" character set, specifically Terminal. (Although there are others laying around, such as GwdTE_437.)

Now, when you use things such as $calc() in mIRC to do math, you don't see any of this happen. However, mIRC has to convert the -character- of each number presented to it into the -value- of that number before it can ask your processor to compute the result of a specific mathematical operation.

Take the number '65' as an example. The characters '6' and '5' represented in binary are the values '54' and '53' respectively. The value '65' in binary, is the value for the character 'A'. So, to do math involving the number '65', the computer must first translate the characters '6' and '5' into the value '65' (which would be printed as 'A', but we won't be seeing it in that form.)

I hope that made sense, I'm reaching for a way to describe it well ;) Post your analogies here!

OK! Now we have our number in a form that the computer can understand and manipulate. If we performed the same conversion on another number (represented as a string), we would have two values in a form that the computer can understand and manipulate. You could then add, subtract, divide, multiply, or perform a number of other calculations on the two numbers. This is where boolean, or bitwise, operators come in. They operate on numeric values in the same way addition and subtraction do, with the exception that they DO something different with those numbers :)

We're getting there...

Binary and Decimal
Some terms I will be using (these are by no means technical, correct, or even real, this is just what I mean when I use these words):

Characters/digits: When I am using it with regards to 'base' and math, I am referring to the possible values for a place in a number. As you've seen with Decimal, Hexadecimal, and Binary, this is not always 10 (0 through 9). I will avoid using the term 'numbers' because it is vague, and with bases higher than 10, you run out of numbers to represent place values with.

Base: This defines the number of possible characters used in a specific numbering system. It derives from the math behind different bases.

Place: Remember in elementary school math, how they referred to "the 1's place", "the 10's place", etc? They were exactly right :) However, with hexadecimal for example, you are really saying "the 1's place" and "the 16's place". This is only partly true because IN hex "16" would be written as "10".. so it is still the "10's place".. follow?

*** We will necessarily be talking about other bases IN base 10, because that is the only base the majority of us are really used to dealing with numbers like that. ***

-----

Computers do math in base 2, called 'binary'. There are two possible characters to represent binary numbers: 0 and 1. (2 characters, base 2, get it? =D)

Binary is often represented in Hexadecimal (Hexa- = 6, Dec- = 10), in which there are sixteen possible characters: 0 through F. This is convenient because with four binary bits, there are a total of 16 possible values.. so two hexidecimal digits can exactly represent eight binary digits.

OK, base conversion deserves its own post, so I'll tell you the quick-and-dirty way: $base()

From mIRC's help file:

$base(N,inbase,outbase,zeropad,precision)
Converts number N from inbase to outbase. The last two parameters are optional.

To convert a number from decimal to binary:

$base(N,10,2)
To convert from binary to decimal:
$base(N,2,10)
I won't worry about dealing with hex, but you should be able to figure out how to convert to and from other bases... =P

Bitwise Operators (Also called boolean operators)
I use the term 'bitwise' for a reason: their purpose is more apparent by that name.

A bitwise operator will work on a binary number one bit at a time, independent of the other bits.

This is a good thing to you, because it means you only ever have to learn 4 things for each operator (well, 2 for NOT).

In fact, let's start with NOT.

Given a binary bit, a 1 or a 0, NOT will return the opposite. It returns the 'bitwise complement' of a binary number. $not() takes one parameter.

So, ALT+TAB over to the copy of mIRC I know you have running, and type the following:

//echo -a $not(0)
"But wait a second... You just said it returns the opposite .. !?"

You should have seen something like this:

4294967295
If not, you are probably running the 16 bit version of mIRC. Go get a real computer ;)

Up to this point I have only been talking about the binary values for a single character, which can be represented with a byte, or eight binary bits (1s or 0s).

However, mIRC32 is not named that way without good reason... mIRC can and does use 32 bits in its numerical calculations, especially the bitwise operators. 32 bits is 4 bytes long. What you saw with the //echo command was the value of the binary number '11111111 11111111 11111111 11111111' (I put spaces between each byte for clarity).

To quote an earlier paragraph:

"A bitwise operator will work on a binary number one bit at a time, independent of the other bits."

mIRC, working with 32 bits, assumed that the '0' you gave its $not() identifier was equivalent to '00000000 00000000 00000000 00000000'. It applied the 'NOT' operator to each and every bit, resulting in the long sequence of 1s you see above. After I explain AND and OR, I might as well expand on the concept of masking, and there I will show you how to restrict your bitwise operators to 8 bits.

I wanted to gloss over signed and unsigned bytes etc, but it is probably relevant about now. In your copy of mIRC, try this:

//echo -a $not(-1)
You should see:
0
This is because of the way negative numbers are represented in binary (-1 is represented the same way as 2^32-1, or 32 1's)... but I might just save that for its own post as well. Suffice it to say that negative numbers will return odd-seeming results ;)

I seem to have forgotten for a moment where I was going with all this.. (I -am- at work ;) .. oh yeah, the rest of the operators.

Ok, now you should understand the way bitwise operators work on a number. They affect the number a bit at a time, for each bit in the number. I am going to (try) to make some little tables to illustrate how AND, OR, and XOR work, then make one final post with examples and interesting information. Maybe two if it gets lengthy ;)

OK, tables it isn't. Here's the info anyway:

Putting it all together
OK, let's start with some examples. My examples are going to be in 8 bits, but keep in mind mIRC will be operating with 32.

My two example numbers will be 170 (in binary, '10101010') and 105 (in binary, '01101001')

For ease of alignment in variable-width fonts, I will do my 'math' vertically, with the operator on the RIGHT (yes, I know this is not how they taught math in school ;)

*** NOT ***

10101010 NOT 
----------
01010101
So, $not(170) == 85. Try it out in mIRC :) Also try $base(N,2,10) where N is the result of any of these examples, to verify that the answer I am showing is correct.
01101001 NOT 
----------
10010110
$not(105) == 150 (don't get any shortcut ideas, this is just a coincidence ;)

**

Now for the interesting ones. Be sure to look at the bits vertically, and see how each operator works towards the end result.

*** AND ***

10101010
01101001 AND
----------
00101000

$and(170,105) == 40
*** OR ***
10101010
01101001 OR
----------
11101011

$or(170,105) == 235
*** XOR ***
10101010
01101001 XOR
----------
11000011

$xor(170,105) == 195
**

Note that 'XOR' stands for 'exclusive OR'. It means that the result is only positive (1) if the two operators are opposite each other.

$or(X,Y) - $and(X,Y) == $xor(X,Y)

Try it:

//echo -a $calc($or(170,105) - $and(170,105)) : $xor(170,105)

It is also useful to note that $xor($xor(X,Y),Y) == X .. If you xor a number by another number twice, you will end up with the first number. This is useful as a form of basic encryption, among other things :) Keep in mind that you probably don't want to use this for text encryption across IRC, as XORing two characters together may result in a character < 32, and only some of those will survive across the IRC server to the other person.

"What possible use could I have for all this?"

Well, you'd be surprised what you can run into with various protocols and file formats, as well as things such as the recent Challenge where these operators are an unexpected boon.

One of the more useful applications of the bitwise operators involves bit masking. Sort-of like you can use modulo (the % operator) to "chop off" a number to a certain max. Well, it's more like 'wrapping around'
Since 7 modulo 5 would be 2:
0, 1, 2, 3, 4, 5, 6, 7
0, 1, 2, 3, 4, 0, 1, 2 <-- modulo 5 counting

In the same way, "chopping off" bits above a certain point in binary can 'wrap' a binary number around below a certain max.

You can also use binary masking to 'set' (make equal to '1') or 'clear' (make equal to '0') a single bit, or multiple bits, in a number's binary equivalent.

Bit masking

It goes something like this.

You have a number (again, I will use 8 bit binary representations of the numbers), and you want to change part of it. AND and OR provide excellent tools to do this.

Briefly:

To 'set' a bit:

Example:

Say I want to set the top three bits of the number 170 ('10101010').

I would OR it with the number 224:

11100000
Watch what happens.
10101010
11100000 OR
----------
11101010
Now, the top three bits of my original number are set, giving me the result, 234.

To 'clear' a bit:

Example:

Using the same number as above, I decide I want to clear the top three bits.

I would AND it with the number 31:
00011111

Watch what happens.

10101010
00011111 AND
----------
00001010
Result: 10

Why it works?

When you OR a number with a 0, you are guaranteed that the result will be the same as it was in the first bit. This is because 0 OR 0 is 0, and 1 OR 0 is 1.

When you AND a number with 1, you are in the same way guaranteed that the result will be the same as it was in the first bit. 0 AND 1 is 0, 1 AND 1 is 1.

However, when you OR a number with 1, the result will always be 1: 0 OR 1 is 1, 1 OR 1 is 1.

When you AND a number with 0, the result will always be 0: 0 AND 0 is 0, 1 AND 0 is 0.

What about XOR?

When you xor a number with a string of bits, wherever you place a '1' in the second number, the result will be the OPPOSITE of what is in the first number:

0 XOR 1 = 1, 1 XOR 1 = 0
Where you place a '0' in the second number, the result will be the SAME:
0 XOR 0 = 0, 1 XOR 0 = 1
An interesting fact that I mentioned talking about ASCII: bit 6 of an ASCII character determines case. If you want to make a letter lowercase, clear bit 6; if you want to make it capital, set bit 6; and if you want to reverse the case, toggle it.

If you read RFC 1459 you will notice that they refer to '{', '}', and '|' as the capital equivalents of '[', ']', and '\'. This is certainly true at least in the ASCII charts :)

--- eof. ---

The above is copied straight out of sections from the webboard posts; It may need some formatting or whatever, but I'm not really an HTML man ;) E-mail me with comments, corrections, or suggestions at [email protected].

Also of interest before closing, Paladin posted this to the board in response to my first post:

>> This is because when Microsoft created Windows (tm) they decided to rearrange all the character assignments. Go figure. (Actually I think it may have something to do with newer versions of Windows and UNICODE, but I don't know, if you know why they did this please enlighten me :)

This is because when they designed the Unicode standard, they moved the high 128 characters out of the first 256 bit range, and up into something like 2000. The reason they did this was to allow internationalization without forcing the developer to move to a 16 bit character model.

My opinion on this decision? Bad move. They should have forced developers wishing to internationalize to implement the 16 bit character model (Unicode wide character) rather than giving them an easy out (in the form of a swappable high 128 bit range). Why? Well look around you... We've had the unicode standard in place for how long now, and still barely anyone has bothered to implement it in an app. This leads to all sorts of annoying bs, like the fact that you can't send 16 bit characters across IRC, and are thus forced to choose between a font which is compatible with other users (the windows character set) or one which supports the console drawing capabilities of terminal applications (ANSI character set).

Blame Microsoft. That's what I do. Or blame Canada. That works too.

--

OK, well that's about it, hope got -something- useful out of all that :)